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Abstract 

Recently duplicated genes are believed to often overlap in function and expression. A priori, they are thus less likely to be 
essential. Although this was indeed observed in yeast, mouse singletons and duplicates were reported to be equally often 
essential. This contradiction can only partly be explained by experimental biases. We herein show that older genes 
(i.e., genes with earlier phyletic origin) are more likely to be essential, regardless of their duplication status. At a given 
phyletic gene age, duplicates are always less likely to be essential compared with singletons. The "paradoxical" high 
essentiality among mouse gene duplicates is then caused by different age profiles of singletons and duplicates, with the 
latter tending to be derived from older genes. 
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In model organisms such as mouse and yeast, phenotypic 
changes caused by single-gene mutations were assayed on 
a genome-wide scale (Kelly et al. 2001; Blake et al. 2011). Of 
particular interest are essential genes, whose removal re- 
sults in death or infertility. Many expressed genes perform- 
ing important molecular functions are nonessential. In 
these cases, it is likely that the gene deletion can be partially 
compensated by another gene with overlapping function 
and expression. 

Gene duplication is believed to be an important source 
of such functional redundancy (Ohno 1970). Accordingly, 
the proportion of essential genes (P E ) among duplicates is 
much lower than among singletons in yeast (Gu et al. 
2003). However, this expected trend was not confirmed 
in mouse, where the proportion of essentials among dupli- 
cates is comparable (Liao and Zhang 2007) or even lower 
(table 1) than among singletons. 

The contradicting results in mouse were initially inter- 
preted as evidence against widespread functional redun- 
dancy of duplicates (Liao and Zhang 2007); this 
interpretation was hotly disputed (Su and Gu 2008; Liang 
and Li 2009; Makino et al. 2009). At that time (Liao and 
Zhang 2007; Su and Gu 2008; Liang and Li 2009; Makino 
et al. 2009), only —5,000 mouse genes had been tested 
in knockout experiments. Biases were expected in this 
subset of mouse genes, as genes with known severe muta- 
tional phenotypes had been selected with higher priority. 
Two follow-up studies (Su and Gu 2008; Makino et al. 2009) 
discovered that the knockout data were further enriched in 
genes derived from old duplications and in developmental 
genes; after correcting these biases, the overall P E in dupli- 



cates became statistically significantly lower than that in 
singletons (Su and Gu 2008; Makino et al. 2009). 

However, the authors did not explore two immediate 
conclusions from their studies (Su and Gu 2008; Makino 
et al. 2009): 1) genes derived from old duplications are more 
likely to be essential than singletons and 2) developmental 
duplicates are more likely to be essential than developmental 
singletons (or indeed singletons as a whole). Both conclu- 
sions hold true in the older as well as the latest versions 
of the mouse phenotype data sets (table 1). This appears 
to again contradict the duplication-functional redundancy 
concept, and we thus consider the issue unresolved. 

What factors other than duplication status affect gene es- 
sentiality? Developmental genes are more likely to be essential 
than nondevelopmental genes (Makino et al. 2009), but this 
should apply to duplicates and singletons alike. It was also 
suggested that hubs in protein-protein interaction networks 
are more likely to be essential (Jeong et al. 2001); however, this 
observation probably reflects biases toward proteins in large 
essential protein complexes (Zotenko et al. 2008). 

Previous studies indicated that the phyletic origin (age) 
of genes, defined by the evolutionarily most distant species 
group where homologs can be found (Wolf et al. 2009), is 
correlated with several gene features (Hao et al. 2010). 
Genes that originated early tend to be conserved across 
species, highly and broadly expressed, and broadly useful 
(Hao et al. 2010). Thus, we hypothesized that knocking 
out phyletically old genes is more likely to have severe phe- 
notypic effects: old genes should be more often essential. 

To test this idea, we classified mouse and yeast genes 
into different age groups according to their earliest phyletic 
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Table 1. Proportions of Essential Genes in Different Gene 
Categories in the Two Phenotypical Data Sets for Mouse. 





Proportion of Essential Genes (%) 




Current 


Dst3 from Makino 


Categories 


Data Set a 


et al. (2009) 


All genes 


43.3 


42.07 


Duplicates 


43.9 (41. 6 b ) 


41.92 


Singletons 


41.1 


42.61 


All developmental genes 


62.53 


59.51 


Developmental duplicates 


64.75 


60.9 


Developmental singletons 


53.1 


53.36 


Old duplications (K s c ^ 2) 


47.31 


44.94 



a MGI 4.4 (October 2010). 

b If only genes with valid phyletic ages are used. 

c If a gene has multiple duplicates, all pairwise K s (the number of synonymous 
substitutions per synonymous site) between this gene and its duplicates will be 
calculated, and the lowest K s value is used. Synonymous substitutions in most 
genes with K s > 2 will have reached saturation, and hence, the corresponding 
genes will tend to be older than genes with K s < 2. 



origin (see Materials and Methods). We classified genes as 
specific to one of five taxonomic groups for yeast (fig. 1A) 
and six broad taxonomic groups for mouse (fig. 1C). 
Because of the large differences between yeast and mouse, 
we did not attempt any direct cross-species comparisons 
and did not attempt to map their histories onto a common 
timescale. 

We found that within each age group, the P E among sin- 
gletons is always higher than among duplicated genes; this 



is true both in mouse and in yeast (fig. 1). Thus, duplicated 
genes indeed tend to be less likely essential. Furthermore, 
for both singletons and duplicated genes, the fraction of 
essential genes increases with increasing age; thus, older 
genes are indeed more likely to be essential (fig. 1). The 
trends observed in figure 1 are reproduced when restricting 
the analysis either to developmental genes or to nondeve- 
lopmental genes (Supplementary figs. SI and S2, Supple- 
mentary Material online; for the raw data, see 
Supplementary table, Supplementary Material online). 

Gene duplicates have two ages: the age of the gene fam- 
ily (phyletic age; fig. 1) and the age of the most recent du- 
plication event (duplication age). The effect of phyletic age 
is likely similar between duplicates and singletons. In addi- 
tion, functional redundancy is expected to be strongly af- 
fected by the age of the duplication event, as duplicates 
derived from ancient duplications are more likely to be es- 
sential than genes derived from recent duplications (Su and 
Gu 2008). In mouse gene duplicates, essentiality reaches 
a plateau in the Fungi/Metazoa group and does not in- 
crease further in the two older age groups. According to 
the reasoning above, this plateau might be caused by a com- 
parably young duplication age. We indeed find that the two 
oldest groups contain higher fractions of younger dupli- 
cates than the "Fungi/Metazoa" group (Supplementary 
fig. S3, Supplementary Material online). 

In each phyletic age group, duplicates are less likely to 
be essential than singletons (fig. 1). Why then is the same 




Fic. 1. In both yeast (A) and mouse (C), genes with more recent phyletic origins are less likely to be essential, as are duplicated genes compared 
with singletons of the same phyletic age. However, ignoring age, the overall proportion of essential genes in singletons is higher in yeast (B) but 
lower in mouse (D) compared with duplications. Filled circles in (C) indicate that the proportion of essential genes in the corresponding 
duplication groups is higher than or closer to the overall P E in singletons (41.1%; the dashed horizontal line); whereas hollow circles indicate 
that P E is lower. 
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not true when disregarding age, as done in previous stud- 
ies (Liao and Zhang 2007; Su and Gu 2008; Liang and Li 
2009; Makino et al. 2009)? This is in fact an instance of 
Simpson's paradox (Simpson 1951), which can arise 
when the dependence of two categorical variables (essen- 
tiality and duplication status) on a third variable (phyletic 
age) is disregarded. To illustrate the mathematics behind 
this paradox, we divided the six age groups of duplicated 
genes into two parts reflecting a very coarse definition of 
age: one including four age groups (the "old part," filled 
circles in fig. 1C) that mostly have higher P E s than the 
overall singletons and other including the remaining 
two groups with lower P E s (the "young part," open circles 
in fig. 1C). This partitioning results in a higher overall pro- 
portion of essential genes in the old duplicate part 
(P E ld = 44.89%) are higher compared with the overall sin- 
gletons (Pg ingleton = 41.1%), whereas the corresponding 
proportion in the young duplicate part 
(P^" 8 =22.97%) is lower. The overall P E of duplicates re- 
gardless of age can be calculated from this as a weighted 
average: 

0 f x pold , f p young 

1 E Jold A ' E ' Jyoung A ' E > 

where f old and/ ycmng are the fraction of duplicates contained 
in the "old" and "young" parts, respectively (with/ old +/ yoU ng 
= 1) (for more details, see Supplementary text and Supple- 
mentary table, Supplementary Material online). In theory, 
the overall P E could be as high as 44.89% or as low as 
22.97%, depending on the values of/ old and / young . In our 
study, we found that the vast majority of duplicates was de- 
rived from old gene families (J o[d = 84.66%), resulting in an 
overall P E of 41.6% for duplicates (see Supplementary table, 
Supplementary Material online). Thus, the surprising result 
of a higher essentiality among mouse duplicates compared 
with singletons is caused by a different age profile of single- 
tons and duplicated gene families. 

Our results differ significantly from a recent publication 
on Drosophila melanogaster (Chen et al. 2010). Based on 
RNAi knockdowns of —440 genes, Chen et al. found that 
—30% of young genes (<35 myr) were essential compared 
with —35% of old genes (>40 myr). The authors concluded 
that "young genes are as essential as old genes in terms of 
viability" (Chen et al. 2010). We reanalyzed their data using 
our methods, which differ from those of Chen et al. in age 
classification and in the separate analysis of duplicates and 
singletons (for the raw data, see Supplementary table, 
Supplementary Material online). We found that the 
proportion of essential genes in both singletons and 
duplicates in general increases with increasing age in the 
five age groups, with some fluctuations (Supplementary 
fig. S4, Supplementary Material online). However, the P E 
in duplicates is not always lower than in singletons of sim- 
ilar age, and the differences are not statistically significant 
(Fisher's exact test, all comparisons P > 0.05; Supplemen- 
tary table, Supplementary Material online), probably due to 
the small size of the data set. Since only a small number 
(—3.2%) of D. melanogaster genes have been tested (Chen 
et al. 2010), our findings regarding D. melanogaster are not 
yet conclusive. 



Materials and Methods 

We determined the phyletic origins of genes from yeast, 
mouse, and fly using a method described in Wolf et al. 
(2009) with modifications (for more details, see Supple- 
mentary text, Supplementary Material online, and for 
the results, Supplementary table, Supplementary Material 
online). We separated genes into singletons and duplicates 
as previously described (Liao and Zhang 2007; Makino et al. 
2009). We grouped duplicates into gene families using 
a clustering-based method (Markov cluster algorithm 
[MCL]; Enright et al. 2002) and then used the most ancient 
origin of all members as the age of the corresponding family. 

We obtained the phenotypic data for the three species 
from online gene essentiality database (Chen et al. 2012), 
which were originally published by the Saccharomyces 
Genome Deletion Project (Cherry et al. 1997), the Mouse 
Genome Informatics (Blake et al. 2011), and the authors of 
Chen et al. (2010), respectively. We restricted further anal- 
yses to genes that were tested in these phenotypic data 
sets. 

Supplementary Material 

Supplementary text, Supplementary figures, and Supple- 
mentary table are available at Molecular Biology and 
Evolution online (http://www.mbe.oxfordjournals.org/). 
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